Friday, 18th of October (2018)

T2D and Associated Traits

Marullo, El-Sayed Moustafa, & Prokopenko (2014)

A lot of SNPs discovered, but …

Flannick & Florez (2016)

Glycaemia and Type 2 Diabetes

A Genetic Joint Effect?

"In all cases, the glucose-raising allele was associated with increased risk of T2D, yet fasting glucose effect sizes and T2D ORs were weakly correlated"

Scott et al. (2012)

Glycaemia and Type 2 Diabetes

Scott et al. (2012)

Glycaemia and Type 2 Diabetes

Yaghootkar & Frayling (2013)

Joint Model

What for?
&
What is it?

What for?

  • To identify biomarker relavent to a disease

  • To identify the effect of a treatment on a disease
    (independently of the association between the disease and the biomarker)

For example:

  • Biomarker: fasting glucose
  • Event: Type 2 Diabetes

Let's start with the Cox Model…

(Extended) Cox model: \[\begin{align} \lambda_i(t)=\lambda_0(t) \exp(\beta Y_i(t) + \alpha Z_i + \eta W_i) \end{align}\]

where:

  • \(\lambda_i(t)\) is the hazard function at time \(t\) for individual \(i\);
  • \(\lambda_0(t)\) is the unspecified baseline hazard function;
  • \(\alpha\) measures the effect of \(Z_i\) on the hazard function;
  • \(\beta\) measures the association between the trajectory function \(Y_i(t)\) and the hazard function.

It works, but biomarkers…

  1. are measured at determined time points (\(t_{ij}\))
  2. can have missing values over time
    • => Imputation?
    • \(\Rightarrow\) Bias introduction
  3. are measured with some degree of error
    • \(\Rightarrow\) Noise in the biomarker trajectory (\(Y_i(t_{ij}) \neq X_i(t_{ij})\))
  4. can be endogenous
    • \(\Rightarrow\) Trajectory can change when the event occurs
    • \(\Rightarrow\) Bias introduction

Hopefully, the Mixed Model is there!

(Generalised) linear mixed effect (LME) model: \[Y_{i}(t_{ij})=X_{i}(t_{ij})+\epsilon_{i}(t_{ij})\]

where:

  • \(Y_{i}(t_{ij})\) is the observed value
  • \(X_{i}(t_{ij})\) is the true (unobserved) value of the longitudinal measurement at time \(t_{ij}\) for individual \(i\).
  • \(\epsilon_{i}(t_{ij})\) is a random error term, usually:
    \[\epsilon_{i}(t_{ij})\sim \mathcal{N}(0,\sigma^2)\]

What can we do with a Joint Model?

Test simultaneously an effect on:

  • a biomarker (\(\gamma\));
  • an event (\(\alpha\));
  • a biomarker and an event (\(\beta\gamma+\alpha\)).

The best part?

A gain in statistical power to detect those effects

  • if \(\beta\neq0\) (to detect a joint effect of \(Z\): \(\beta\gamma+\alpha\neq0\)) Chen, Ibrahim, & Chu (2011);
  • compared to the extended Cox model.

What is Joint Model? With a picture!

What is Joint Model? With a picture!

What is Joint Model? With a picture!

What is Joint Model? With a picture!

What is Joint Model? With a picture!

What is Joint Model? With a picture!

What is Joint Model? With a picture!

What is Joint Model? With a picture!

What is Joint Model? With equations!

The standard (joint likelihood) formulation involves two components:

  • a longitudinal component
  • a time-to-event (survival) component.

Let:

  • \(n\), the sample size;
  • \(i\), an individual (\(i=1,\cdots,n\));
  • \(m_i\), the number of measurements on individual \(i\);
  • \(t_{ij}\), a time points (\(j=1,\cdots,m_i\)).

The longitudinal component

(Generalised) linear mixed effect (LME) model: \[Y_{i}(t_{ij})=X_{i}(t_{ij})+\epsilon_{i}(t_{ij})\] \(X_{ij}\) is the trajectory function, and could be defined: \[\begin{gather}X_{i}(t_{ij})=\theta_{0i} + \theta_{1i}t_{ij} + \cdots + \theta_{pi}t_{ij}^p &, & \boldsymbol\theta_p \sim \mathcal{N}(\boldsymbol\mu_, \boldsymbol\Sigma)\end{gather}\]

For simplicity here, we assume linearity over time (\(\theta_{0i}+\theta_{1i}t_{ij}\)): \[\begin{gather}Y_{i}(t_{ij})=\theta_{0i}+\theta_{1i}t_{ij}+\gamma Z_i+\delta W_i+\epsilon_{ij} &, & \boldsymbol{\theta} \sim \mathcal{N}_2 (\boldsymbol{\mu},\boldsymbol{\Sigma})\end{gather}\]

The longitudinal component

(Generalised) linear mixed effect (LME) model: \[Y_{i}(t_{ij})=\theta_{0i}+\theta_{1i}t_{ij}+\gamma Z_i+\delta W_i+\epsilon_{ij}\]

where:

  • \(Y_{i}(t_{ij})\), the observed value;
  • \(X_{i}(t_{ij})\), the true (unobserved) value of the longitudinal measurement at time \(t_{ij}\) for individual \(i\);
  • \(\epsilon_{ij}\) is a random error term, usually: \(\epsilon_{ij}\sim \mathcal{N}(0,\sigma^2)\);
  • \(Z_i\), a vector denoting the genotype of individual \(i\);
  • \(W_i\), a set of adjusting covariates.

The time-to-event component

(Extended) Cox model (proportional hazards): \[\begin{align} \lambda_i(t)&=\lim_{dt \to 0} \frac{P\{t\leq T_i<t+dt|T_i\geq t, \bar{Y_i}(t), Z_i, W_i\}}{dt}\\ &=\lambda_0(t) \exp\{\beta X_{i}(t) + \alpha Z_i + \eta W_i\} \end{align}\]

Where:

  • \(\bar{Y_i}(t)=\{Y_i(u),0 \leq u \leq t\}\), the history of the trajectory;
  • \(T_i\), the event time for individual \(i\);
  • \(C_i\), the right censoring time (end of the follow-up);
  • \(\Delta_i\), an event indicator: \(\begin{cases} \Delta_i=0, & \text{if }\ T_i>C_i. \\ \Delta_i=1, & \text{if }\ T_i <= C_i. \end{cases}\)

Hypothesis testing

Null hypothesis: \(\begin{cases}H_0&:& \theta=\theta_0\\H_1&:& \theta\neq\theta_0\end{cases}\)

  • Likelihood Ratio Test
    \(LRT=-2\{\ell(\hat{\theta}_0)-\ell(\hat{\theta})\}\)

  • Wald Test
    \(W=(\hat{\theta}-\theta_0)^\top \mathcal{I}(\hat{\theta})(\hat{\theta}-\theta_0)\)
    (univariate: \((\hat{\theta}_j-\theta_{0j})/\widehat{s.e.}(\hat{\theta}_j)\))

  • Score Test
    \(U=S^\top(\hat{\theta}_0)\{\mathcal{I}(\hat{\theta}_0)\}^{-1}S(\hat{\theta}_0)\)

Is the Joint Model worth it?

Let's find out with simulations!

Estimators (& Computation time)

  • Is Joint Model estimators good?
    Bias, variance and RMSE (Root-Mean Square Error)
    \[\begin{align} \operatorname{MSE}(\hat\phi)&= \operatorname{Bias}(\hat\phi)^2 + \operatorname{Var}(\hat\phi)\\ \operatorname{RMSE}(\hat{\phi})&=\sqrt{\operatorname{MSE}(\hat\phi)}\\ &=\sqrt{E\{(\hat{\phi}-\phi)^2\}} \end{align}\] \[\phi=(\beta, \gamma, \alpha)\]

  • Can we do a whole genome analysis…
    … in a reasonable time frame?

Let's find a more "naive" approach!

What if, we split the job in two?

\(\Rightarrow\) "Two-Step" (Tsiatis, DeGruttola, & Wulfsohn, 1995)?

  1. (Generalised) linear mixed effect (LME) model
    \[\begin{align} Y_{i}(t)&=X_i(t)+ \epsilon_{i}(t)\\ X^*_{i}(t)&=E\{X_{i}(t)|\bar{Y_i}(t), T_i\geq t\} \end{align}\]

  2. (Extended) Cox model (proportional hazards)
    \[h_i(t)=h_0(t) \exp\{\beta X^*_{i}(t)\}\]

Time to generate fake data!

Let's keep it simple, i.e., without covariates:

  • the trajectory: \(Y_{i}(t)=\theta_{0i} + \theta_{1i}t + \gamma Z_i + \epsilon_{i}(t)\)

  • the event: \(\lambda_i(t)=\lambda_0(t) \exp\{\beta X_{i}(t) + \alpha Z_i\}\)

  • the time of event, following the (classical) exponential distribution (Austin, 2012):

    \[\begin{gather} H_i(T_i)=\int_0^{T_i}\lambda_0(t) \exp(\beta X_i(t)+\alpha Z_i)dt , & \lambda_0(t)=\lambda\\ F_i(T_i)=1-exp(-H_i(T_i))=u , & u\sim\mathcal{U}(0, 1) \end{gather}\] \[T_i=\frac{1}{\beta\theta_{1i}}\log\left(1-\frac{\beta\theta_{1i}\times \log(1-u)}{\lambda \exp(\beta\theta_{0i}+(\beta\gamma+\alpha)Z_i)}\right)\]

Set the parameters!

Scott et al. (2012)

Yaghootkar & Frayling (2013)

Set the parameters!

Parameters and numerical values used for sensitivity analysis and simulations, based on results from rs17747324 within gene TCF7L2 in the French cohort D.E.S.I.R.
Parameters Values
Number of participants (\(n\)) 4,352
Number of measures (\(m\)) 4
Diabetes incidence rate (\(d\)) 0.0384
Minor allele frequency (\(f\)) 0.244
Random effects (\(\theta\)) \(\sim\mathcal{N}_2\left (\begin{bmatrix}4.55\\0.0108\end{bmatrix} , \begin{bmatrix} 0.143 & -0.00109 \\ -0.00109 & 6.8\times 10^{-04} \end{bmatrix} \right )\)
SNP effect on \(Y_{ij}\) (\(\gamma\)) 0.0229
SNP effect on \(T_i\) (\(\alpha\)) 0.265
Association between \(Y_{ij}\) and \(T_i\) (\(\beta\)) 3.17
Error term (\(\epsilon\)) \(\sim\mathcal{N}(0,0.305^2)\)

Me?

References

Austin, P. C. (2012). Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Statistics in Medicine, 31(29), 3946–3958. https://doi.org/10.1002/sim.5452

Chen, L. M., Ibrahim, J. G., & Chu, H. (2011). Sample size and power determination in joint modeling of longitudinal and survival data. Statistics in Medicine, 30(18), 2295–2309. https://doi.org/10.1002/sim.4263

Flannick, J., & Florez, J. C. (2016). Type 2 diabetes: Genetic data sharing to advance complex disease research. Nature Reviews Genetics, advance online publication. https://doi.org/10.1038/nrg.2016.56

Marullo, L., El-Sayed Moustafa, J. S., & Prokopenko, I. (2014). Insights into the Genetic Susceptibility to Type 2 Diabetes from Genome-Wide Association Studies of Glycaemic Traits. Current Diabetes Reports, 14(11). https://doi.org/10.1007/s11892-014-0551-8

Scott, R. A., Lagou, V., Welch, R. P., Wheeler, E., Montasser, M. E., Luan, J., … Barroso, I. (2012). Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nature Genetics, 44(9), 991–1005. https://doi.org/10.1038/ng.2385

Tsiatis, A. A., DeGruttola, V., & Wulfsohn, M. S. (1995). Modeling the Relationship of Survival to Longitudinal Data Measured with Error. Applications to Survival and CD4 Counts in Patients with AIDS. Journal of the American Statistical Association, 90(429), 27–37. https://doi.org/10.2307/2291126

Yaghootkar, H., & Frayling, T. M. (2013). Recent progress in the use of genetics to understand links between type 2 diabetes and related metabolic traits. Genome Biology, 14(3), 203.